rephrased response
When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition
Li, Karen Jia-Hui, Balloccu, Simone, Dusek, Ondrej, Reiter, Ehud
The increasing trust in large language models (LLMs), especially in the form of chatbots, is often undermined by the lack of their extrinsic evaluation. This holds particularly true in nutrition, where randomised controlled trials (RCTs) are the gold standard, and experts demand them for evidence-based deployment. LLMs have shown promising results in this field, but these are limited to intrinsic setups. We address this gap by running the first RCT involving LLMs for nutrition. We augment a rule-based chatbot with two LLM-based features: (1) message rephrasing for conversational variety and engagement, and (2) nutritional counselling through a fine-tuned model. In our seven-week RCT (n=81), we compare chatbot variants with and without LLM integration. We measure effects on dietary outcome, emotional well-being, and engagement. Despite our LLM-based features performing well in intrinsic evaluation, we find that they did not yield consistent benefits in real-world deployment. These results highlight critical gaps between intrinsic evaluations and real-world impact, emphasising the need for interdisciplinary, human-centred approaches.\footnote{We provide all of our code and results at: \\ \href{https://github.com/saeshyra/diet-chatbot-trial}{https://github.com/saeshyra/diet-chatbot-trial}}
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.04)
- (5 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Consumer Health (1.00)
- Education > Health & Safety > School Nutrition (1.00)
- Health & Medicine > Health Care Technology (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)
How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses
Lin, Jionghao, Han, Zifei, Thomas, Danielle R., Gurung, Ashish, Gupta, Shivang, Aleven, Vincent, Koedinger, Kenneth R.
One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it presents challenges due to the time-consuming nature of assessing trainee performance by human experts. Inspired by the recent advancements of large language models (LLMs), our study employed the GPT-4 model to build an explanatory feedback system. This system identifies trainees' responses in binary form (i.e., correct/incorrect) and automatically provides template-based feedback with responses appropriately rephrased by the GPT-4 model. We conducted our study on 410 responses from trainees across three training lessons: Giving Effective Praise, Reacting to Errors, and Determining What Students Know. Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0.84 and an AUC score of 0.85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > Switzerland (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting (1.00)